---
title: "Phase 1: PSI Construction & Aggregation Sensitivity"
subtitle: "Power-Sharing Index Methodological Documentation"
author: "Jessala A. Grijalva"
date: "`r format(Sys.Date(), '%B %d, %Y')`"
format:
  pdf:
    toc: true
    toc-depth: 3
    number-sections: true
    geometry: margin=1in
    colorlinks: true
    keep-tex: false
    fig-width: 8
    fig-height: 5
    code-overflow: wrap
execute:
  echo: true
  warning: false
  message: false
---

```{r}
#| label: setup
#| echo: false

source(here::here("R", "00_setup.R"))

# ── Load V-Dem Data ──────────────────────────────────────────────────────────
vdem <- readRDS(here("data", "raw", "V-Dem-CY-Full+Others-v15.rds"))

usa <- vdem |>
  filter(country_name == "United States of America") |>
  arrange(year) |>
  select(
    year, country_name,
    all_of(PSI_VARS),
    all_of(VDEM_INDICES),
    v2x_suffr
  )

# ── Construct PSI Under Three Aggregation Methods ────────────────────────────
usa_psi <- usa |>
  mutate(
    across(all_of(PSI_VARS), normalize_minmax, .names = "{.col}_norm"),
    polyarchy_norm = normalize_minmax(v2x_polyarchy)
  ) |>
  mutate(
    psi_additive = (v2pepwrsoc_norm + v2pepwrgen_norm + v2clsocgrp_norm +
                    v2cltort_norm + v2clkill_norm) / 5,
    psi_multiplicative = (pmax(v2pepwrsoc_norm, EPS) * pmax(v2pepwrgen_norm, EPS) *
                          pmax(v2clsocgrp_norm, EPS) * pmax(v2cltort_norm, EPS) *
                          pmax(v2clkill_norm, EPS))^(1/5),
    psi_hybrid = polyarchy_norm * psi_multiplicative,
    era = assign_era(year)
  )
```

\newpage

# Introduction

This document constructs the Power-Sharing Index (PSI) and presents sensitivity analyses demonstrating robustness to specification choices. The hybrid aggregation method is selected and substantive conclusions are shown to be invariant to methodological decisions.

**Output:** `data/processed/psi_phase1_results.rda` (loaded by Phase 2 and Appendix).

## Relationship to Existing Approaches

This analysis follows the general framework established by Sigman and Lindberg (2019) for constructing composite democracy indices, with two key departures:

1. **Aggregation transparency**: Sigman and Lindberg do not present sensitivity analyses for their aggregation choices. This document explicitly compares three aggregation methods and demonstrates that results are robust across specifications.

2. **Component selection logic**: Sigman and Lindberg recommend avoiding V-Dem variables that appear in existing indices to prevent circularity. This concern does not apply here because PSI measures a conceptually distinct phenomenon---cross-group power transfer rather than procedural democracy quality. The discriminant validity analysis in Phase 2 confirms this empirically: PSI shows a *negative* correlation with Electoral Democracy during the Herrenvolk era (1789--1865), demonstrating that it captures something qualitatively different from existing indices.

\newpage

# Aggregation Methods: Theoretical Rationale

Three aggregation approaches are considered, each with distinct theoretical implications for how exclusion across dimensions should be weighted.

## Additive (Arithmetic Mean)

$$PSI_{add} = \frac{1}{5}\sum_{i=1}^{5} X_{i,norm}$$

**Theoretical logic**: High scores in some dimensions can compensate for low scores in others. A society with strong gender equality but weak racial equality would score moderately.

**Implication**: Partial inclusion in some dimensions can "offset" exclusion in others.

## Multiplicative (Geometric Mean)

$$PSI_{mult} = \left(\prod_{i=1}^{5} X_{i,norm}\right)^{1/5}$$

**Theoretical logic**: Low values in *any* dimension pull down the overall score. A society cannot "average its way out of exclusion."

**Implication**: Power-sharing requires inclusion across *all* dimensions simultaneously. Zero in any dimension produces zero overall.

## Hybrid (Multiplicative $\times$ Polyarchy)

$$PSI_{hybrid} = Polyarchy_{norm} \times \left(\prod_{i=1}^{5} X_{i,norm}\right)^{1/5}$$

**Theoretical logic**: Combines the multiplicative penalty for exclusion with a procedural democracy cap. Even perfect inclusion cannot exceed the quality of democratic institutions.

**Implication**: Power-sharing is meaningful only within functioning democratic procedures. Authoritarian regimes with nominal inclusion still score low.

\newpage

# Comparing Aggregation Methods

## Visual Comparison

```{r}
#| label: fig-compare-methods
#| fig-cap: "PSI by Aggregation Method"
#| fig-height: 6

usa_psi |>
  select(year, psi_additive, psi_multiplicative, psi_hybrid) |>
  pivot_longer(-year, names_to = "Method", values_to = "PSI") |>
  mutate(Method = case_when(
    Method == "psi_additive" ~ "Additive",
    Method == "psi_multiplicative" ~ "Multiplicative",
    Method == "psi_hybrid" ~ "Hybrid"
  ) |> factor(levels = c("Additive", "Multiplicative", "Hybrid"))) |>
  ggplot(aes(x = year, y = PSI, color = Method)) +
  geom_line(linewidth = 1) +
  geom_vline(xintercept = c(1865, 1920, 1965), linetype = "dashed", alpha = 0.4) +
  scale_color_manual(values = c("Additive" = "black",
                                "Multiplicative" = ELEC_COLOR,
                                "Hybrid" = PSI_COLOR)) +
  scale_y_continuous(limits = c(0, 1), labels = percent) +
  labs(title = "PSI by Aggregation Method",
       subtitle = "All three methods show same substantive pattern; differ in magnitude",
       x = NULL, y = "Power-Sharing Index") +
  theme_psi()
```

\noindent \textit{Note.} Vertical dashed lines mark the end of the Civil War (1865), women's suffrage (1920), and the Voting Rights Act (1965). All three methods produce the same substantive pattern; they differ only in magnitude.

## Era-Level Summary Statistics

```{r}
#| label: tbl-era-summary

usa_psi |>
  group_by(era) |>
  summarise(
    N = n(),
    `Additive Mean` = mean(psi_additive, na.rm = TRUE),
    `Multiplicative Mean` = mean(psi_multiplicative, na.rm = TRUE),
    `Hybrid Mean` = mean(psi_hybrid, na.rm = TRUE),
    .groups = "drop"
  ) |>
  gt() |>
  tab_header(
    title = "PSI by Era and Aggregation Method",
    subtitle = "All methods show same pattern: near-zero → low → high"
  ) |>
  fmt_number(columns = -c(era, N), decimals = 2)
```

\noindent \textit{Note.} Means rounded to 2 decimal places. All methods show the same ordering across eras.

## Correlation Across Methods

```{r}
#| label: tbl-method-correlations

tibble(
  Comparison = c("Additive vs. Multiplicative",
                 "Additive vs. Hybrid",
                 "Multiplicative vs. Hybrid"),
  Correlation = c(
    cor(usa_psi$psi_additive, usa_psi$psi_multiplicative, use = "complete.obs"),
    cor(usa_psi$psi_additive, usa_psi$psi_hybrid, use = "complete.obs"),
    cor(usa_psi$psi_multiplicative, usa_psi$psi_hybrid, use = "complete.obs")
  )
) |>
  gt() |>
  tab_header(
    title = "Correlation Across Aggregation Methods",
    subtitle = "High correlations indicate robust measurement"
  ) |>
  fmt_number(columns = Correlation, decimals = 3)
```

\noindent \textit{Note.} Pearson correlations computed on complete cases across the full time series (1789--2024).

\newpage

# Why Hybrid? Theoretical Justification

The hybrid method is preferred for two theoretical reasons:

## 1. Multiplicative Penalty for Exclusion

The multiplicative component ensures that exclusion in *any* dimension reduces the overall score. This reflects the theoretical claim that power-sharing requires inclusion across multiple dimensions simultaneously.

Consider a hypothetical society with perfect gender equality (1.0) but complete racial exclusion (0.0):

- **Additive**: $(1.0 + 0.0 + 0.5 + 0.5 + 0.5) / 5 = 0.50$ (moderate score)
- **Multiplicative**: $(1.0 \times 0.001 \times 0.5 \times 0.5 \times 0.5)^{1/5} = 0.07$ (near-zero)

The multiplicative approach correctly identifies this as a society with minimal cross-group power-sharing.

## 2. Procedural Democracy Cap

The Polyarchy multiplier ensures that PSI cannot exceed the quality of democratic institutions. This addresses a potential concern: could an authoritarian regime with nominal inclusion score highly?

The hybrid formulation prevents this. A regime with Polyarchy = 0.2 cannot exceed PSI = 0.2, regardless of inclusion scores. Power-sharing is meaningful only within functioning democratic procedures.

## Empirical Demonstration

```{r}
#| label: fig-hybrid-behavior
#| fig-cap: "How the Hybrid Method Works"
#| fig-height: 5

usa_psi |>
  ggplot(aes(x = year)) +
  geom_line(aes(y = psi_multiplicative, color = "Multiplicative (uncapped)"), linewidth = 0.8) +
  geom_line(aes(y = psi_hybrid, color = "Hybrid (Polyarchy-capped)"), linewidth = 0.8) +
  geom_line(aes(y = polyarchy_norm, color = "Polyarchy (cap)"), linewidth = 0.6, linetype = "dashed") +
  scale_color_manual(values = c("Multiplicative (uncapped)" = ELEC_COLOR,
                                "Hybrid (Polyarchy-capped)" = PSI_COLOR,
                                "Polyarchy (cap)" = "gray50")) +
  scale_y_continuous(limits = c(0, 1), labels = percent) +
  labs(title = "How the Hybrid Method Works",
       subtitle = "Multiplicative score is capped by procedural democracy quality",
       x = NULL, y = "Index Value", color = NULL) +
  theme_psi()
```

\noindent \textit{Note.} The hybrid PSI (red) tracks the multiplicative score (blue) but is bounded above by the normalized Polyarchy score (gray dashed). This ensures power-sharing is measured within democratic procedures.

\newpage

# Sensitivity Analysis: Jackknife (Leave-One-Component-Out)

To assess whether any single component is driving results, each component is dropped in turn and the index is recalculated.

```{r}
#| label: fig-jackknife
#| fig-cap: "Jackknife Sensitivity Analysis"
#| fig-height: 6

usa_psi |>
  mutate(
    Full = psi_additive,
    `Drop: Social Group` = (v2pepwrgen_norm + v2clsocgrp_norm + v2cltort_norm + v2clkill_norm) / 4,
    `Drop: Gender` = (v2pepwrsoc_norm + v2clsocgrp_norm + v2cltort_norm + v2clkill_norm) / 4,
    `Drop: Civil Liberties` = (v2pepwrsoc_norm + v2pepwrgen_norm + v2cltort_norm + v2clkill_norm) / 4,
    `Drop: Torture` = (v2pepwrsoc_norm + v2pepwrgen_norm + v2clsocgrp_norm + v2clkill_norm) / 4,
    `Drop: Killings` = (v2pepwrsoc_norm + v2pepwrgen_norm + v2clsocgrp_norm + v2cltort_norm) / 4
  ) |>
  select(year, Full, starts_with("Drop")) |>
  pivot_longer(-year, names_to = "Version", values_to = "PSI") |>
  mutate(Version = factor(Version, levels = c("Full", "Drop: Social Group", "Drop: Gender",
                                               "Drop: Civil Liberties", "Drop: Torture", "Drop: Killings"))) |>
  ggplot(aes(x = year, y = PSI, color = Version, linewidth = Version)) +
  geom_line() +
  scale_color_manual(values = c("Full" = "black", "Drop: Social Group" = "#1b9e77", "Drop: Gender" = "#d95f02",
                                "Drop: Civil Liberties" = "#7570b3", "Drop: Torture" = "#e7298a", "Drop: Killings" = "#66a61e")) +
  scale_linewidth_manual(values = c(1.5, rep(0.6, 5))) +
  scale_y_continuous(limits = c(0, 1), labels = percent) +
  labs(title = "Jackknife Sensitivity Analysis",
       subtitle = "Dropping any single component does not change substantive pattern",
       x = NULL, y = "PSI (Additive)") +
  theme_psi() +
  guides(linewidth = "none")
```

\noindent \textit{Note.} Black line shows the full five-component index. Colored lines show the index with each component dropped in turn. When a drop line falls above the full index, that component was pulling the score down.

## Interpretation

Key findings:

- **Gender and Civil Liberties** pull scores down most in early periods (as expected---women's exclusion and racial exclusion)
- **Torture** pulls scores down in middle periods
- No single component drives the overall pattern
- Substantive conclusions (four eras, post-1965 shift) are invariant to component exclusion

\newpage

# Alternative Specifications

Beyond the three primary methods, additional specifications are tested to ensure robustness.

```{r}
#| label: fig-alt-specs
#| fig-cap: "Alternative Specifications"
#| fig-height: 6

usa_psi |>
  mutate(
    Additive = psi_additive,
    Geometric = psi_multiplicative,
    Hybrid = psi_hybrid,
    Minimum = pmin(v2pepwrsoc_norm, v2pepwrgen_norm, v2clsocgrp_norm, v2cltort_norm, v2clkill_norm),
    `Weighted (2x Power)` = (2*v2pepwrsoc_norm + 2*v2pepwrgen_norm + v2clsocgrp_norm + v2cltort_norm + v2clkill_norm) / 7
  ) |>
  select(year, Additive, Geometric, Hybrid, Minimum, `Weighted (2x Power)`) |>
  pivot_longer(-year, names_to = "Specification", values_to = "PSI") |>
  ggplot(aes(x = year, y = PSI, color = Specification)) +
  geom_line(linewidth = 0.7) +
  scale_y_continuous(limits = c(0, 1), labels = percent) +
  labs(title = "Alternative Specifications",
       subtitle = "All specifications show same substantive pattern",
       x = NULL, y = "PSI") +
  theme_psi()
```

\noindent \textit{Note.} All five specifications produce the same substantive periodization, differing only in magnitude.

## Specifications Tested

| Specification | Formula | Rationale |
|----|----|----|
| Additive | Arithmetic mean of 5 components | Standard approach |
| Geometric | Geometric mean of 5 components | Penalizes any-dimension exclusion |
| Hybrid | Geometric $\times$ Polyarchy | Adds procedural cap |
| Minimum | Lowest component score | Maximum penalty for exclusion |
| Weighted (2x Power) | Double-weights power distribution | Tests sensitivity to weighting |

\newpage

# Response to Historical Events

A properly constructed index should respond to known historical events that affected cross-group power distribution.

```{r}
#| label: tbl-events

events <- tribble(
  ~Event, ~Year_Before, ~Year_After, ~Expected,
  "19th Amendment (1920)", 1919, 1921, "Increase",
  "Voting Rights Act (1965)", 1964, 1966, "Major Increase",
  "Shelby County v. Holder (2013)", 2012, 2014, "Decline"
)

get_psi <- function(yr, method = "additive") {
  val <- if (method == "additive") {
    usa_psi$psi_additive[usa_psi$year == yr]
  } else {
    usa_psi$psi_hybrid[usa_psi$year == yr]
  }
  if (length(val) == 0) NA_real_ else val
}

events |>
  rowwise() |>
  mutate(
    PSI_Before = get_psi(Year_Before),
    PSI_After = get_psi(Year_After),
    Change = PSI_After - PSI_Before,
    Direction = if_else(Change > 0, "\u2191", "\u2193")
  ) |>
  ungroup() |>
  select(Event, Expected, PSI_Before, PSI_After, Change, Direction) |>
  gt() |>
  tab_header(
    title = "PSI Response to Historical Events",
    subtitle = "Index moves in expected direction for all major events"
  ) |>
  fmt_number(columns = c(PSI_Before, PSI_After, Change), decimals = 3)
```

\noindent \textit{Note.} PSI values from the additive specification. All three events produce changes in the expected direction, providing construct validity evidence.

\newpage

# Summary

## Key Findings

1. **Robustness**: All aggregation methods produce the same substantive pattern. Conclusions do not depend on methodological choices.

2. **Jackknife stability**: No single component drives results. The index is not an artifact of any particular variable.

3. **Construct validity**: The index responds appropriately to known historical events (19th Amendment, VRA, Shelby County).

## Recommended Specification

**Hybrid (Multiplicative $\times$ Polyarchy)** is recommended because:

- Multiplicative aggregation correctly penalizes exclusion in any dimension
- Polyarchy cap ensures power-sharing is measured within democratic procedures
- Results are robust to alternative specifications

## Comparison to Sigman & Lindberg (2019)

| Aspect | Sigman & Lindberg | This Analysis |
|----|----|----|
| Sensitivity analysis | Not presented | Full jackknife and alternative specifications |
| Aggregation choice | Single method | Three methods compared, robustness demonstrated |
| Component selection | Avoid existing indices | Selection based on conceptual fit; distinctiveness shown empirically |

\newpage

# Save Results

```{r}
#| label: save-results
#| echo: false

# Save constructed PSI data for Phase 2 and Appendix
psi_phase1 <- list(
  usa_psi       = usa_psi,
  usa_raw       = usa,
  psi_vars      = PSI_VARS,
  psi_labels    = PSI_LABELS,
  normalize_fn  = normalize_minmax,
  eps           = EPS
)

save(psi_phase1, file = here("data", "processed", "psi_phase1_results.rda"))
```

Output saved to `data/processed/psi_phase1_results.rda`. Render Phase 2 next.

\newpage

# Session Info

```{r}
#| label: session
#| echo: false

sessionInfo()
```
